To answer the research question of “What is the effect over time of music industry influence on musical artists, in terms of musical content?” measures of popularity, complexity, and outside influence to an artist should be measured over time. These measures are subjective, but justifiable and based in reason from the data. Each created metric will be made for each song of a specific case study artist for which there is valid data. Ultimately the goal will be to compare each of these metrics for all of a case study artsit’s songs which charted at some point on the Billboard Hot 100 and plot each of these against the song’s release date.
Popularity will be measured using the data on a song’s life and behavior when it was on the Billboard Hot 100. Multiple metrics will be proposed and compared. The complexity of a song will be measured by combining musical complexity (if there exists data on it) and lyrical complexity. Finally outside influence will be measured primarily by the number of writers on the song who were non-band members.
All the functions will be defined in a separate source file, and then called in this file. All of the data used here should have already been preprocessed in a previous file.
options(kableExtra.auto_format = FALSE)
library(ggrepel)
library(htmltools)
library(tidyverse)
library(tidytext)
library(data.table)
library(plyr)
library(quanteda)
library(kableExtra)
library(knitr)
library(gridExtra)
library(formattable)
library(psych)
library(PerformanceAnalytics)
source("frostFunctions.R")
billboardDf = read_csv("FrostData/billboardDataClean.csv", col_types = cols())
spotifyDf = read_csv("FrostData/spotifyDataClean.csv", col_types = cols())
riaaDf = read_csv("FrostData/riaaDataClean.csv", col_types = cols())
grammyDf = read_csv("FrostData/grammyDataClean.csv", col_types = cols())
songSecsDf = read_csv("FrostData/songSectionDataClean.csv", col_types = cols())
songAttrsDf = read_csv("FrostData/songAttrsDataClean.csv", col_types = cols())
Here the functionality will be built to join all data on a chosen artist. For the examples to follow, the band “Maroon 5” will be used.
archArtist = artistDataJoiner("Maroon 5")
validAlbums = c("Red Pill Blues + (Deluxe)", "v (Deluxe)", " Overexposed Track by Track", "Hands all over (Deluxe)", "it Won't be Soon Before Long.", "Songs About Jane")
archArtist = filter(archArtist, Album %in% validAlbums)
#archArtist
Quantifying popularity will be an done in multiple ways to account for imperfections about each metric. There will be multiple popularity metrics, and they can be compared and contrasted across songs. They are as follows: pop1 = sum(1/current * weeks) pop2 = sum(1/current) pop3 = ln(101.1- min(peak)) pop4 = mean(ln(101.1 - current))
Pop1 is a metric which rewards songs which reach their peak on the charts later in their lifetime on the charts so due to this it discrimates against tracks which peak right away and disipate quickly. Pop2 is a metric which does not have an appropriate scale, as having the 2nd spot on the Hot 100 is half as valuable as the number 1 spot. Pop3 only considers the peak position on the chart, but does scale it more appropriately than the first 2. Pop4 uses the natural log scale to more appropriately consider differences in chart position, and takes the mean of all the log chart positions to account for both longevity and position.
archArtistPop = getPopularityMetric(archArtist)
#archArtistPop
ggplot(archArtistPop, aes(x = ReleaseDate, y = pop1)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Pop1 of Maroon 5 Songs By Release Date")
ggplot(archArtistPop, aes(x = ReleaseDate, y = pop2)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Pop2 of Maroon 5 Songs By Release Date")
ggplot(archArtistPop, aes(x = ReleaseDate, y = pop3)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Pop3 of Maroon 5 Songs By Release Date")
ggplot(archArtistPop, aes(x = ReleaseDate, y = pop4)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Pop4 of Maroon 5 Songs By Release Date")
write_csv(archArtistPop, "maroon5_pop.csv")
By each popularity metric made, there is indication of a slight increase in the level of popularity in the Maroon 5’s charting songs as time increases. In pop1 and pop2, a high leverage point likely has some influence on the exact slope of the best fit line.
To consider how much outside of influence was given in the creation of a song, counting the number of writers of the song who are not the artist themselves.
maroon5Members= c("Adam Levine", "Jesse Carmichael", "Mickey Madden", "James Valentine", "Matt Flynn", "PJ Morton", "Sam Farrar", "Ryan Dusick")
archArtistInfluence = getOutsideInfluenceScore(archArtist, maroon5Members)
#archArtistInfluence
ggplot(archArtistInfluence, aes(x = ReleaseDate, y = nonBandMemberWriters)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Number of non-Band Member Writers on Maroon 5's Songs Over Time")
It is apparant that all of Maroon 5’s Billboard Hot 100 charting songs were written completely and solely by members of Maroon 5 until just before 2015. After this point, all the charting songs have multiple writing credits given to writers not in Maroon 5. This indicates increased outside influence in the band’s later music.
Some further preprocessing will be done to tidy the lyric data. Then the number of total words and unique non stop words will be counted, and the number of unique words divided by the total number of words will be used as a metric to give some measure of lyrical repetition in the song. Furthermore, the average word length in the song will be recorded, as well the number of words divdided by the number of seconds in the song to get the words per second. Repetition, or the the number of unique words divided by the total number of words, will be considered most important and thus weighed most heavily. The average length of words in the song will be considered second most important and weighed just below the measure of repetition, and the average number of syllables in each word in the song as well as the number of words per second will be weighed the lightest.
lyricalComplexDf = getLyricalComplexity(archArtist, TRUE)
## Joining, by = "word"
#lyricalComplexDf
ggplot(lyricalComplexDf, aes(x = ReleaseDate, y = lyricalComplexity)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Lyrical Complexity of Maroon 5 Songs by Release Date")
When the lyrical complexity of Maroon 5’s Billboard charting songs are plotted against time, it is aparrent that there is a negative assosication between lyrical complexity and time for Maroon 5 charting songs. By far the least lyrically complex song was one of the more recent, and released around 2017. While this is worth noting, it is should also be said that this point and a point around 2002 given a very high score are both arguably high leverage points.
Previously, the music data was held for each section of each song, but it will need to be aggregated to each song. Now for each song measures of the number of unique chords, non-diatonic chords, extended chords, number of sections, and the number of section ends that are different will be held. It should be noted that not all songs will have music chord data, so these units will recieve 0 for musical complexity after the present complexity levels are standardized. This is so these songs will not affect the total complexity of a song which will be calculated later using the standardized lyrical and musical complexities.
The musical complexity score is computed by weighing the number of non-ditonic chords, or chords outside of the key the song is in that are not expected to be heard, and the numebr of unique chords in the song more than the number of extended chords and the number of sections which are different, as these are argueably less difinitive measures of musical complexity.
musicComplexDf = getMusicComplexity(archArtist, TRUE)
#musicComplexDf
ggplot(musicComplexDf, aes(x = ReleaseDate, y = musicalComplexity)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Musical Complexity of Maroon 5 Songs By Release Date")
There is less music chord data to go off of, but what is there shows Maroon 5 as scoring lower muiscal complexity scores in their later released songs in comparison to their earlier released material.
Now all of the smaller metric datasets will be joined and all of the columns other than the count of writers who are not in the band will be standardized.
artistMetricDf = fullMetricsDataSet(archArtistPop, archArtistInfluence, lyricalComplexDf, musicComplexDf, TRUE)
#artistMetricDf
#artistMetricDf %>%
# select(Name, pop1, pop2, pop3, pop4) %>%
# gather(key = "Metric", value = "Score", -Name) %>%
# ggplot(aes(x = Name, y = Score, fill = Metric)) + geom_col(position = "dodge") + labs(title = "Comparison of Popularity Metrics Across Maroon 5 Billboard Hot 100 Songs") + theme(axis.text.x = element_text(angle = 90))
Now that all of the metrtic data is collected along with the original data on the artist’s songs, their tracks can be compared directly to each other in some meaningful ways. First a function will be made to compare chosen tracks of a particular artist. The result is a few plots tracking each song’s life on the Billboard Hot 100, and a plot giving the track’s contribution to the pop1 metric each week. Then there are some formatted and color coded tables to summarize the metric data. Things shaded red are below the average when standardized and the green are above 0. This is run on a selection of Maroon 5 songs below.
#Can track pop1 metric over time because weeks is a changing metric
#Join all of the originality and complexity metrics because they are attatched to the song, not moving by week
tables = compareTracks(c("She will be loved", "Girls like you"), archArtist, artistMetricDf)
#tables = compareTracks(c("She will be loved", "Harder to Breathe", "Wait", "Sugar"), archArtist, artistMetricDf)
kable(tables[1]) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
|
as.data.frame(tables[2]) %>%
mutate(pop1 = cell_spec(pop1, "html",color = ifelse(pop1 > 0,"green", "red")),
pop2 = cell_spec(pop2, "html",color = ifelse(pop2 > 0,"green", "red")),
pop3 = cell_spec(pop3, "html",color = ifelse(pop3 > 0,"green", "red")),
pop4 = cell_spec(pop4, "html",color = ifelse(pop4 > 0,"green", "red")),) %>%
kable("html", escape = FALSE) %>%
kable_styling()
| Name | ReleaseDate | pop1 | pop2 | pop3 | pop4 | GrammyAward | RiaaStatus |
|---|---|---|---|---|---|---|---|
| Girls Like you | 2018-05-30 | 3.18889267498101 | 3.0525303855611 | 0.783685927715491 | 1.02741452937264 | NA | 1x Platinum |
| She will be Loved | 2002-06-25 | 0.192183699021031 | 0.148605952040361 | 0.703917878435716 | 0.892206551534005 | NA | 4x Multi-Platinum |
as.data.frame(tables[3]) %>%
mutate(totalComplexity = cell_spec(totalComplexity, "html",color = ifelse(totalComplexity > 0,"green", "red"))) %>%
kable("html", escape = FALSE) %>%
kable_styling()
| Name | ReleaseDate | totalComplexity |
|---|---|---|
| She will be Loved | 2002-06-25 | 0.465584251357378 |
| Girls Like you | 2018-05-30 | -0.838057270738207 |
| Girls Like you | 2018-05-30 | -0.941449625856669 |
In the comparison of an older and very popular charting song of Maroon 5’s called “She will be Loved” to a newer song of theirs that is indicated as their most commercially popular, “Girls Like you”, it is apparant that small differences in how the tracks performed on the charts had significant differences in their contributions to the pop1 metric. “Girls Like you” was on the chart for about 10 weeks longer which certainly was an indication of greater popularity rewarded with continual addition to the pop1 score, but also it maintained its peak spot on the charts at number 1 for much longer. “She will be Loved” climbed to its peak and fell out of the top 10 in the same number of weeks that “Girls Like you” maintained the top spot. This maintainence was heavily rewarded in greater and greater pop1 score additions each week, and is really the reason why “Girls Like you” received such a great value in that metric.
Supplementary to this, the tables indicate that “Girls Like you” had 6 outside writers whereas “She will be Loved” had none. Both songs had over-average popularity for Maroon 5 songs, but “Girls Like you” had greater scores across the metrics. What is note-worthy is that “She will be Loved” had an over-average complexity and “Girls Like you” was about 1 standard deviation under the average in complexity, with only a minor difference being made for the version that is more complex with a rap verse.
All of the previously created functionality should be able to be applied to any valid artist that there is available data on. The full pipeline of function calls is below. The input just requires the artist name, the names of those given the writing credits of a track who are representing the artist or group, and the valid albums which should be considered.
#Need to pass artist, and valid albs
maroon5Metrics = completeArchDf("Maroon 5", c("Adam Levine", "Jesse Carmichael", "Mickey Madden", "James Valentine", "Matt Flynn", "PJ Morton", "Sam Farrar", "Ryan Dusick"), c("Red Pill Blues + (Deluxe)", "v (Deluxe)", "Overexposed Track by Track", "Hands all over (Deluxe)", "it Won't be Soon Before Long.", "Songs About Jane"), c(), TRUE) #May be 2 versions of girls like you - one with rap and one without
## Joining, by = "word"
singleArtistVisual("Maroon 5",maroon5Metrics)
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
##
## Call:
## lm(formula = fullMetric$totalComplexity ~ fullMetric$ReleaseDate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8623 -0.8262 -0.1167 0.4326 4.0794
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.2439757 1.8417639 2.847 0.00776 **
## fullMetric$ReleaseDate -0.0003467 0.0001210 -2.865 0.00742 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.18 on 31 degrees of freedom
## Multiple R-squared: 0.2094, Adjusted R-squared: 0.1839
## F-statistic: 8.209 on 1 and 31 DF, p-value: 0.007421
From the simple linear regressions of each of the three metrics against the release date of all of Maroon 5’s charting songs, it seems there was a moderate decrease in their song complexity, a slight increase in their song popularity, and drastic increase in outside influence to their music as time passed.
justinTimberlakeMetrics = completeArchDf("Justin Timberlake", c("Justin Timberlake"), c("Justified", "Man of the Woods", "The 20/20 Experience - 2 of 2 (Deluxe)", "The 20/20 Experience (Deluxe Version)", "Futuresex/Lovesounds Deluxe Edition"), c(), TRUE)
## Joining, by = "word"
singleArtistVisual("Justin Timberlake",justinTimberlakeMetrics)
##
## Call:
## lm(formula = fullMetric$totalComplexity ~ fullMetric$ReleaseDate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4169 -0.6048 0.3002 0.7639 1.3309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3010869 1.7359658 0.173 0.864
## fullMetric$ReleaseDate -0.0000184 0.0001143 -0.161 0.874
##
## Residual standard error: 1.005 on 16 degrees of freedom
## Multiple R-squared: 0.001617, Adjusted R-squared: -0.06078
## F-statistic: 0.02592 on 1 and 16 DF, p-value: 0.8741
From the simple linear regressions of each of the three metrics against the release date of all of Justin Timberlake’s charting songs, it seems there was a insignificant change in his song complexity, a very slight decrease in his song popularity, and a slight increase in outside influence to his music as time passed.
twentyOnePilotsMetrics = completeArchDf("Twenty One Pilots", c("Tyler Joseph", "Josh Dun", "Nick Thomas", "Chris Salih"), c("Trench", "Blurryface","Vessel (with Bonus Tracks)", "Twenty One Pilots"), c("Cancer"), TRUE)#Cancer was a cover so it is excluded even though it made the chart
## Joining, by = "word"
singleArtistVisual("Twenty One Pilots", twentyOnePilotsMetrics)
##
## Call:
## lm(formula = fullMetric$totalComplexity ~ fullMetric$ReleaseDate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.3339 -0.6712 -0.1554 0.5499 1.6170
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.2306247 11.5187697 0.454 0.666
## fullMetric$ReleaseDate -0.0003069 0.0006755 -0.454 0.666
##
## Residual standard error: 1.062 on 6 degrees of freedom
## Multiple R-squared: 0.03326, Adjusted R-squared: -0.1279
## F-statistic: 0.2064 on 1 and 6 DF, p-value: 0.6656
From the simple linear regressions of each of the three metrics against the release date of all of Twenty One Pilot’s charting songs, it seems there was a very slight decrease in their song complexity, a slight decrease in their song popularity, and no change in outside influence to their music as time passed.
fooFightersMetrics = completeArchDf("Foo Fighters", c("Dave Grohl", "Nate Mendel", "Pat Smear", "Taylor Hawkins", "Chris Shiflett", "Rami Jaffee", "William Goldsmith", "Franz Stahl"), c("Wasting Light", "Echoes, Silence, Patience & Grace", "In your Honor", "One by One (Expanded Edition)", "There is Nothing Left to Lose", "The Colour and the Shape", "Concrete and Gold", "Foo Fighters", "Sonic Highways"), c(), TRUE)
## Joining, by = "word"
singleArtistVisual("Foo Fighters", fooFightersMetrics)
##
## Call:
## lm(formula = fullMetric$totalComplexity ~ fullMetric$ReleaseDate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8654 -0.5308 0.1518 0.3827 0.7732
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.6339632 2.0759554 -3.677 0.00624 **
## fullMetric$ReleaseDate 0.0005724 0.0001549 3.695 0.00608 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6447 on 8 degrees of freedom
## Multiple R-squared: 0.6306, Adjusted R-squared: 0.5844
## F-statistic: 13.65 on 1 and 8 DF, p-value: 0.006085
From the simple linear regressions of each of the three metrics against the release date of all of the Foo Fighter’s charting songs, it seems there was a significant increase in their song complexity, a significant decrease in their song popularity, and no change in outside influence to their music as time passed.
taylorSwiftMetrics = completeArchDf("Taylor Swift", c("Taylor Swift"), c("Reputation", "1989 (Deluxe Edition)", "Red (Deluxe Edition)", "Speak Now (Deluxe Edition)", "Fearless (Platinum Edition)", "Taylor Swift"), c(), TRUE )
## Joining, by = "word"
singleArtistVisual("Taylor Swift",taylorSwiftMetrics)
##
## Call:
## lm(formula = fullMetric$totalComplexity ~ fullMetric$ReleaseDate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.32948 -0.77178 0.09415 0.71349 2.41927
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.6341690 1.8710342 1.408 0.164
## fullMetric$ReleaseDate -0.0001717 0.0001216 -1.412 0.163
##
## Residual standard error: 1.173 on 69 degrees of freedom
## Multiple R-squared: 0.02807, Adjusted R-squared: 0.01399
## F-statistic: 1.993 on 1 and 69 DF, p-value: 0.1625
From the simple linear regressions of each of the three metrics against the release date of all of Taylor Swift’s charting songs, it seems there was slight decrease in her song complexity, a very slight increase in her song popularity, and a significant increase in outside influence to her music as time passed.
justinBieberMetrics = completeArchDf("Justin Bieber", c("Justin Bieber"), c("Purpose (Deluxe)", "Journals", "Believe (Deluxe Edition)", "under the Mistletoe (Deluxe Edition)", "My World 2.0", "Never Say Never - The Remixes", "My World"), c(), TRUE)
## Joining, by = "word"
singleArtistVisual("Justin Bieber",justinBieberMetrics)
##
## Call:
## lm(formula = fullMetric$totalComplexity ~ fullMetric$ReleaseDate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.21817 -0.51190 0.03743 0.53507 2.66832
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.249e+00 2.484e+00 -0.503 0.617
## fullMetric$ReleaseDate 7.961e-05 1.580e-04 0.504 0.617
##
## Residual standard error: 1.026 on 51 degrees of freedom
## Multiple R-squared: 0.004953, Adjusted R-squared: -0.01456
## F-statistic: 0.2538 on 1 and 51 DF, p-value: 0.6166
From the simple linear regressions of each of the three metrics against the release date of all of Justin Bieber’s charting songs, it seems there was a very slight increase in his song complexity, a slight increase in his song popularity, and an insignificant change in outside influence to his music as time passed.
britneySpearsMetrics= completeArchDf("Britney Spears", c("Britney Spears"), c("Britney Jean (Deluxe Version)", "Femme Fatale (Deluxe Version)", "Circus (Deluxe Version)", "Blackout", "In the Zone", "Britney (Digital Deluxe Version)", "Oops!... i Did it Again", "...baby One more Time (Digital Deluxe Version)", "Glory (Deluxe Version)") ,c(), TRUE)
## Joining, by = "word"
singleArtistVisual("Britney Spears",britneySpearsMetrics)
##
## Call:
## lm(formula = fullMetric$totalComplexity ~ fullMetric$ReleaseDate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.37977 -0.46568 0.09747 0.77227 1.30747
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.3201364 1.6363225 -1.418 0.168
## fullMetric$ReleaseDate 0.0001655 0.0001160 1.427 0.165
##
## Residual standard error: 0.982 on 27 degrees of freedom
## Multiple R-squared: 0.07011, Adjusted R-squared: 0.03567
## F-statistic: 2.036 on 1 and 27 DF, p-value: 0.1651
From the simple linear regressions of each of the three metrics against the release date of all of Taylor Swift’s charting songs, it seems there was moderate increase in her song complexity, a very slight increase in her song popularity, and a moderate increase in outside influence to her music as time passed.
jColeMetrics = completeArchDf("j Cole", c("j Cole"), c("Revenge of the Dreamers Iii", "Kod", "2014 Forest Hills Drive", "Cole World: The Sideline Story", "4 your Eyez Only", "Born Sinner", "The Blow Up"), c(), TRUE)
## Joining, by = "word"
singleArtistVisual("j Cole", jColeMetrics)
##
## Call:
## lm(formula = fullMetric$totalComplexity ~ fullMetric$ReleaseDate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.81566 -0.63115 0.07773 0.66597 1.92082
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.0553501 3.1151674 -2.265 0.0293 *
## fullMetric$ReleaseDate 0.0004128 0.0001821 2.267 0.0291 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9508 on 38 degrees of freedom
## Multiple R-squared: 0.1192, Adjusted R-squared: 0.096
## F-statistic: 5.141 on 1 and 38 DF, p-value: 0.02913
From the simple linear regressions of each of the three metrics against the release date of all of J Cole’s charting songs, it seems there was moderate increase in his song complexity, a moderate decrease in his song popularity, and a decrease in outside influence to his music as time passed.
maroon5Metrics$label2 = maroon5Metrics$ReleaseDate > as.Date("2014-01-01")
maroon5Metrics$label2
## [1] TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [23] FALSE FALSE TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE
summary(lm(pop1~totalComplexity,data = maroon5Metrics))
##
## Call:
## lm(formula = pop1 ~ totalComplexity, data = maroon5Metrics)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5810 -0.5374 -0.4279 0.0819 3.4326
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.211e-17 1.768e-01 0.000 1.000
## totalComplexity -1.806e-02 1.375e-01 -0.131 0.896
##
## Residual standard error: 1.016 on 31 degrees of freedom
## Multiple R-squared: 0.0005561, Adjusted R-squared: -0.03168
## F-statistic: 0.01725 on 1 and 31 DF, p-value: 0.8964
summary(lm(pop1~ totalComplexity*label2,data= maroon5Metrics))
##
## Call:
## lm(formula = pop1 ~ totalComplexity * label2, data = maroon5Metrics)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2140 -0.3502 -0.3022 0.1995 2.9513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.211195 0.213769 -0.988 0.3313
## totalComplexity -0.001883 0.147585 -0.013 0.9899
## label2TRUE 0.696501 0.388026 1.795 0.0831 .
## totalComplexity:label2TRUE 0.192628 0.398698 0.483 0.6326
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9963 on 29 degrees of freedom
## Multiple R-squared: 0.1005, Adjusted R-squared: 0.007442
## F-statistic: 1.08 on 3 and 29 DF, p-value: 0.373
#lm(pop1~totalComplexity*(totalComplexitybreaks[1]) + totalComplexity*(totalComplexity>=breaks[2]), data = artistMetricDf)
summary(lm(totalComplexity~nonBandMemberWriters,data = maroon5Metrics))
##
## Call:
## lm(formula = totalComplexity ~ nonBandMemberWriters, data = maroon5Metrics)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7420 -0.9788 -0.1901 0.5998 4.9100
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.30034 0.31373 0.957 0.346
## nonBandMemberWriters -0.14265 0.09875 -1.445 0.159
##
## Residual standard error: 1.303 on 30 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.06503, Adjusted R-squared: 0.03387
## F-statistic: 2.087 on 1 and 30 DF, p-value: 0.1589
summary(lm(totalComplexity~ nonBandMemberWriters*label2,data= maroon5Metrics))
##
## Call:
## lm(formula = totalComplexity ~ nonBandMemberWriters * label2,
## data = maroon5Metrics)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8587 -0.9330 -0.1669 0.6297 4.9586
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.25179 0.34268 0.735 0.469
## nonBandMemberWriters -0.08758 0.16934 -0.517 0.609
## label2TRUE 0.36728 1.09195 0.336 0.739
## nonBandMemberWriters:label2TRUE -0.13376 0.28030 -0.477 0.637
##
## Residual standard error: 1.344 on 28 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.07274, Adjusted R-squared: -0.02661
## F-statistic: 0.7322 on 3 and 28 DF, p-value: 0.5415
summary(lm(pop1~nonBandMemberWriters,data = maroon5Metrics))
##
## Call:
## lm(formula = pop1 ~ nonBandMemberWriters, data = maroon5Metrics)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.18809 -0.48793 -0.17433 0.05925 2.93248
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.34684 0.22707 -1.527 0.1371
## nonBandMemberWriters 0.16899 0.07147 2.364 0.0247 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9434 on 30 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.1571, Adjusted R-squared: 0.129
## F-statistic: 5.59 on 1 and 30 DF, p-value: 0.02473
summary(lm(pop1~nonBandMemberWriters*label2,data = maroon5Metrics))
##
## Call:
## lm(formula = pop1 ~ nonBandMemberWriters * label2, data = maroon5Metrics)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.26000 -0.40040 -0.18283 0.09901 2.87826
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.338338 0.248568 -1.361 0.184
## nonBandMemberWriters 0.136980 0.122830 1.115 0.274
## label2TRUE 0.001501 0.792059 0.002 0.999
## nonBandMemberWriters:label2TRUE 0.040855 0.203316 0.201 0.842
##
## Residual standard error: 0.9746 on 28 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.1604, Adjusted R-squared: 0.07046
## F-statistic: 1.783 on 3 and 28 DF, p-value: 0.1732
#This is for correlations between popularity metrics
#Should hope to have metrics which are strongly correlated with one another
corr.test(maroon5Metrics[c("pop1", "pop2", "pop3", "pop4")])
## Call:corr.test(x = maroon5Metrics[c("pop1", "pop2", "pop3", "pop4")])
## Correlation matrix
## pop1 pop2 pop3 pop4
## pop1 1.00 0.95 0.40 0.49
## pop2 0.95 1.00 0.47 0.55
## pop3 0.40 0.47 1.00 0.91
## pop4 0.49 0.55 0.91 1.00
## Sample Size
## [1] 33
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## pop1 pop2 pop3 pop4
## pop1 0.00 0.00 0.02 0.01
## pop2 0.00 0.00 0.01 0.00
## pop3 0.02 0.01 0.00 0.00
## pop4 0.00 0.00 0.00 0.00
##
## To see confidence intervals of the correlations, print with the short=FALSE option
chart.Correlation(maroon5Metrics[c("pop1", "pop2", "pop3", "pop4")])
#Need to change this to be 0
#The correlations between lyrical and musical complexity
#Not really looking at strong correlation as a success, there can be music that is musically simple and lyrically very involved and complex
corr.test(maroon5Metrics[c("lyricalComplexity", "musicalComplexity")])
## Call:corr.test(x = maroon5Metrics[c("lyricalComplexity", "musicalComplexity")])
## Correlation matrix
## lyricalComplexity musicalComplexity
## lyricalComplexity 1.0 0.4
## musicalComplexity 0.4 1.0
## Sample Size
## [1] 33
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## lyricalComplexity musicalComplexity
## lyricalComplexity 0.00 0.02
## musicalComplexity 0.02 0.00
##
## To see confidence intervals of the correlations, print with the short=FALSE option
chart.Correlation(maroon5Metrics[c("lyricalComplexity", "musicalComplexity")])
maroon5Metrics$Artist = "Maroon 5"
maroon5Metrics = maroon5Metrics[,(names(maroon5Metrics) !="label2")]
justinTimberlakeMetrics$Artist = "Justin Timberlake"
justinBieberMetrics$Artist = "Justin Bieber"
twentyOnePilotsMetrics$Artist = "Twenty One Pilots"
britneySpearsMetrics$Artist = "Britney Spears"
jColeMetrics$Artist = "J Cole"
taylorSwiftMetrics$Artist = "Taylor Swift"
fooFightersMetrics$Artist = "Foo Fighters"
allArtistMetrics = rbind(maroon5Metrics, justinTimberlakeMetrics, justinBieberMetrics, twentyOnePilotsMetrics, britneySpearsMetrics, jColeMetrics, taylorSwiftMetrics, fooFightersMetrics)
allArtistMetrics
Now do the same statistical investigation as done for just Maroon 5.
summary(lm(pop1~totalComplexity,data = allArtistMetrics))
##
## Call:
## lm(formula = pop1 ~ totalComplexity, data = allArtistMetrics)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8858 -0.4904 -0.3379 -0.0151 4.6874
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.003915 0.060688 0.065 0.9486
## totalComplexity -0.100315 0.056221 -1.784 0.0755 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9823 on 260 degrees of freedom
## Multiple R-squared: 0.0121, Adjusted R-squared: 0.008297
## F-statistic: 3.184 on 1 and 260 DF, p-value: 0.07554
summary(lm(totalComplexity~nonBandMemberWriters,data = allArtistMetrics))
##
## Call:
## lm(formula = totalComplexity ~ nonBandMemberWriters, data = allArtistMetrics)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7840 -0.7046 0.1309 0.6568 5.1107
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.09968 0.09440 1.056 0.292
## nonBandMemberWriters -0.05002 0.03369 -1.485 0.139
##
## Residual standard error: 1.081 on 259 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.008443, Adjusted R-squared: 0.004615
## F-statistic: 2.205 on 1 and 259 DF, p-value: 0.1387
summary(lm(pop1~nonBandMemberWriters,data = allArtistMetrics))
##
## Call:
## lm(formula = pop1 ~ nonBandMemberWriters, data = allArtistMetrics)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9133 -0.4408 -0.3550 -0.0286 4.8756
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.08890 0.08602 -1.033 0.302
## nonBandMemberWriters 0.04796 0.03069 1.562 0.119
##
## Residual standard error: 0.985 on 259 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.009338, Adjusted R-squared: 0.005513
## F-statistic: 2.441 on 1 and 259 DF, p-value: 0.1194
corr.test(allArtistMetrics[c("pop1", "pop2", "pop3", "pop4")])
## Call:corr.test(x = allArtistMetrics[c("pop1", "pop2", "pop3", "pop4")])
## Correlation matrix
## pop1 pop2 pop3 pop4
## pop1 1.00 0.95 0.38 0.47
## pop2 0.95 1.00 0.42 0.48
## pop3 0.38 0.42 1.00 0.80
## pop4 0.47 0.48 0.80 1.00
## Sample Size
## [1] 262
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## pop1 pop2 pop3 pop4
## pop1 0 0 0 0
## pop2 0 0 0 0
## pop3 0 0 0 0
## pop4 0 0 0 0
##
## To see confidence intervals of the correlations, print with the short=FALSE option
chart.Correlation(allArtistMetrics[c("pop1", "pop2", "pop3", "pop4")])
allArtistMetricsSub = allArtistMetrics[allArtistMetrics$musicalComplexity != 0 ,]
corr.test(allArtistMetricsSub[c("lyricalComplexity", "musicalComplexity")])
## Call:corr.test(x = allArtistMetricsSub[c("lyricalComplexity", "musicalComplexity")])
## Correlation matrix
## lyricalComplexity musicalComplexity
## lyricalComplexity 1.00 0.33
## musicalComplexity 0.33 1.00
## Sample Size
## [1] 35
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## lyricalComplexity musicalComplexity
## lyricalComplexity 0.00 0.05
## musicalComplexity 0.05 0.00
##
## To see confidence intervals of the correlations, print with the short=FALSE option
chart.Correlation(allArtistMetricsSub[c("lyricalComplexity", "musicalComplexity")])
Now artists will be compared to one another based on unstandardized values.
maroon5Reg = completeArchDf("Maroon 5", c("Adam Levine", "Jesse Carmichael", "Mickey Madden", "James Valentine", "Matt Flynn", "PJ Morton", "Sam Farrar", "Ryan Dusick"), c("Red Pill Blues + (Deluxe)", "v (Deluxe)", "Overexposed Track by Track", "Hands all over (Deluxe)", "it Won't be Soon Before Long.", "Songs About Jane"), c(), FALSE)
## Joining, by = "word"
maroon5Reg$Artist = "Maroon 5"
taylorSwiftReg = completeArchDf("Taylor Swift", c("Taylor Swift"), c("Reputation", "1989 (Deluxe Edition)", "Red (Deluxe Edition)", "Speak Now (Deluxe Edition)", "Fearless (Platinum Edition)", "Taylor Swift"), c(), FALSE )
## Joining, by = "word"
taylorSwiftReg$Artist = "Taylor Swift"
fooFightersReg = completeArchDf("Foo Fighters", c("Dave Grohl", "Nate Mendel", "Pat Smear", "Taylor Hawkins", "Chris Shiflett", "Rami Jaffee", "William Goldsmith", "Franz Stahl"), c("Wasting Light", "Echoes, Silence, Patience & Grace", "In your Honor", "One by One (Expanded Edition)", "There is Nothing Left to Lose", "The Colour and the Shape", "Concrete and Gold", "Foo Fighters", "Sonic Highways"), c(), FALSE)
## Joining, by = "word"
fooFightersReg$Artist = "Foo Fighters"
jColeReg = completeArchDf("j Cole", c("j Cole"), c("Revenge of the Dreamers Iii", "Kod", "2014 Forest Hills Drive", "Cole World: The Sideline Story", "4 your Eyez Only", "Born Sinner", "The Blow Up"), c(), FALSE)
## Joining, by = "word"
print(jColeReg)
## Name ReleaseDate Label
## 1 4 your Eyez Only 2016-12-09 <NA>
## 2 Album of the Year 2018-08-07 <NA>
## 3 Apparently 2014-12-09 <NA>
## 4 Atm 2018-04-20 <NA>
## 5 Brackets 2018-04-20 <NA>
## 6 Can't Get Enough 2011-09-01 Columbia
## 7 Change 2016-12-09 <NA>
## 8 Crooked Smile 2013-06-04 Columbia
## 9 Deja Vu 2016-12-09 <NA>
## 10 Everybody Dies 2016-12-02 <NA>
## 11 False Prophets 2016-12-01 <NA>
## 12 Foldin Clothes 2016-12-09 <NA>
## 13 For Whom the Bell Tolls 2016-12-09 <NA>
## 14 Friends 2018-04-20 <NA>
## 15 Immortal 2016-12-09 <NA>
## 16 Intro 2018-04-20 <NA>
## 17 Intro 2018-04-20 <NA>
## 18 Intro 2018-04-20 <NA>
## 19 Intro 2018-04-20 <NA>
## 20 Intro 2018-04-20 <NA>
## 21 Intro 2018-04-20 <NA>
## 22 Intro 2018-04-20 <NA>
## 23 Intro 2018-04-20 <NA>
## 24 Intro 2018-04-20 <NA>
## 25 Kevin's Heart 2018-04-20 <NA>
## 26 Kod 2018-04-20 <NA>
## 27 Middle Child 2019-01-23 Dreamville / Roc Nation
## 28 Motiv8 2018-04-20 <NA>
## 29 Neighbors 2016-12-09 <NA>
## 30 No Role Modelz 2014-12-09 Columbia
## 31 Nobody's Perfect 2012-02-07 <NA>
## 32 Once an Addict 2018-04-20 <NA>
## 33 Photograph 2018-04-20 <NA>
## 34 Power Trip 2013-02-14 Columbia
## 35 The Cut Off 2018-04-20 <NA>
## 36 Ville Mentality 2016-12-08 <NA>
## 37 Wet Dreamz 2014-12-09 Columbia
## 38 Who Dat 2010-05-31 <NA>
## 39 Window Pain 2018-04-20 <NA>
## 40 Work Out 2011-06-15 Roc Nation / Columbia
## Album pop1 pop2 pop3 pop4
## 1 4 your Eyez Only 0.03448276 0.03448276 4.278054 4.278054
## 2 <NA> 0.01149425 0.01149425 2.646175 2.646175
## 3 2014 Forest Hills Drive 1.93047180 0.22212162 3.763523 3.076983
## 4 Kod 0.38752898 0.23544230 4.554929 3.715292
## 5 Kod 0.03333333 0.03333333 4.264087 4.264087
## 6 Cole World: The Sideline Story 3.41457134 0.32037345 3.893859 3.609290
## 7 4 your Eyez Only 0.07171543 0.05966724 4.383276 3.639594
## 8 Born Sinner 5.52643770 0.47061960 4.305416 3.980937
## 9 4 your Eyez Only 3.46526138 0.48440854 4.544358 3.571305
## 10 <NA> 0.01754386 0.01754386 3.786460 3.786460
## 11 <NA> 0.04778973 0.02927121 3.852273 2.972069
## 12 4 your Eyez Only 0.05374150 0.04353741 4.264087 2.697745
## 13 4 your Eyez Only 0.06498364 0.05423095 4.357990 3.224927
## 14 <NA> 0.02173913 0.02173913 4.009150 4.009150
## 15 4 your Eyez Only 0.26925192 0.14506853 4.500920 3.220235
## 16 Cole World: The Sideline Story 0.01886792 0.01886792 3.873282 3.873282
## 17 Cole World: The Sideline Story 0.01886792 0.01886792 3.873282 3.873282
## 18 Cole World: The Sideline Story 0.01886792 0.01886792 3.873282 3.873282
## 19 Kod 0.01886792 0.01886792 3.873282 3.873282
## 20 Kod 0.01886792 0.01886792 3.873282 3.873282
## 21 Kod 0.01886792 0.01886792 3.873282 3.873282
## 22 2014 Forest Hills Drive 0.01886792 0.01886792 3.873282 3.873282
## 23 2014 Forest Hills Drive 0.01886792 0.01886792 3.873282 3.873282
## 24 2014 Forest Hills Drive 0.01886792 0.01886792 3.873282 3.873282
## 25 <NA> 0.33246960 0.18895264 4.533674 3.600024
## 26 Kod 1.29504874 0.28727103 4.511958 3.699885
## 27 Revenge of the Dreamers Iii 20.88688824 2.16905996 4.575741 4.458565
## 28 Kod 0.09483568 0.08075117 4.455509 3.930017
## 29 4 your Eyez Only 0.36849758 0.15365051 4.478473 3.652004
## 30 2014 Forest Hills Drive 7.11264959 0.48206565 4.175925 3.653336
## 31 Cole World: The Sideline Story 2.40644699 0.24984939 3.691376 2.901888
## 32 <NA> 0.02127660 0.02127660 3.990834 3.990834
## 33 Kod 0.09642857 0.08392857 4.467057 3.758165
## 34 Born Sinner 16.53854529 1.00535499 4.407938 4.049711
## 35 <NA> 0.03571429 0.03571429 4.291828 4.291828
## 36 4 your Eyez Only 0.06388889 0.05277778 4.345103 3.376024
## 37 2014 Forest Hills Drive 2.84362761 0.26636225 3.691376 3.073159
## 38 <NA> 0.01075269 0.01075269 2.091864 2.091864
## 39 <NA> 0.02439024 0.02439024 4.096010 4.096010
## 40 Cole World: The Sideline Story 18.44841715 1.13219629 4.478473 4.024805
## nonBandMemberWriters lyricalComplexity musicalComplexity
## 1 0 13.93350 1
## 2 3 13.97051 1
## 3 0 11.57687 1
## 4 2 13.13359 1
## 5 4 12.69375 1
## 6 1 12.81370 1
## 7 0 13.94342 1
## 8 1 13.28807 1
## 9 0 13.01123 1
## 10 0 14.27632 1
## 11 0 13.94164 1
## 12 0 11.90967 1
## 13 0 10.66431 1
## 14 0 14.11279 1
## 15 0 12.52378 1
## 16 0 17.98508 1
## 17 0 14.99689 1
## 18 0 16.08843 1
## 19 0 17.98508 1
## 20 0 14.99689 1
## 21 0 16.08843 1
## 22 0 17.98508 1
## 23 0 14.99689 1
## 24 0 16.08843 1
## 25 0 12.52765 1
## 26 0 13.36855 1
## 27 0 13.48261 1
## 28 0 13.18835 1
## 29 2 13.21251 1
## 30 0 12.69324 1
## 31 2 12.68357 1
## 32 0 13.90294 1
## 33 0 12.21598 1
## 34 0 12.52172 1
## 35 1 12.96982 1
## 36 0 11.50065 1
## 37 2 13.00303 1
## 38 4 13.60967 1
## 39 0 12.99910 1
## 40 6 11.72769 1
## totalComplexity
## 1 14.93350
## 2 14.97051
## 3 12.57687
## 4 14.13359
## 5 13.69375
## 6 13.81370
## 7 14.94342
## 8 14.28807
## 9 14.01123
## 10 15.27632
## 11 14.94164
## 12 12.90967
## 13 11.66431
## 14 15.11279
## 15 13.52378
## 16 18.98508
## 17 15.99689
## 18 17.08843
## 19 18.98508
## 20 15.99689
## 21 17.08843
## 22 18.98508
## 23 15.99689
## 24 17.08843
## 25 13.52765
## 26 14.36855
## 27 14.48261
## 28 14.18835
## 29 14.21251
## 30 13.69324
## 31 13.68357
## 32 14.90294
## 33 13.21598
## 34 13.52172
## 35 13.96982
## 36 12.50065
## 37 14.00303
## 38 14.60967
## 39 13.99910
## 40 12.72769
jColeReg$Artist = "J Cole"
justinTimberlakeReg = completeArchDf("Justin Timberlake", c("Justin Timberlake"), c("Justified", "Man of the Woods", "The 20/20 Experience - 2 of 2 (Deluxe)", "The 20/20 Experience (Deluxe Version)", "Futuresex/Lovesounds Deluxe Edition"), c(), FALSE)
## Joining, by = "word"
justinTimberlakeReg$Artist = "Justin Timberlake"
artistCompare = function(artistDfs){
fullDf = bind_rows(artistDfs)
artists = unique(fullDf$Artist)
artist1 = paste(artists[-length(artists)], collapse = ", ")
artist2 = tail(artists, n= 1)[[1]]
popPlot = ggplot(fullDf, aes(x = ReleaseDate, y = pop1, color = Artist, shape = Artist)) + geom_smooth(method = "lm",se = FALSE)+ geom_point(alpha= 0.5) + labs(x = "Release Date", y = "Popularity Metric (AU)")
compPlot= ggplot(fullDf, aes(x = ReleaseDate, y = lyricalComplexity, color = Artist, shape = Artist)) + geom_smooth(method = "lm", se = FALSE)+ geom_point(alpha= 0.5) + labs(x= "Release Date",y = "Lyrical Complexity Metric (AU)")
infPlot = ggplot(fullDf, aes(x = ReleaseDate, y = nonBandMemberWriters, color = Artist, shape = Artist)) + geom_smooth(method = "lm", se = FALSE)+ geom_point(alpha= 0.5) + labs(x= "Release Date",y = "Number of Outside Writers")
grid.arrange(popPlot, compPlot, infPlot, ncol = 2, nrow = 2, top = paste("Popularity, Lyrical Complexity, and Outside Influence of ", artist1, " and ", artist2, " Songs Over Time"))
}
artistCompare(list(maroon5Reg, fooFightersReg, justinTimberlakeReg))
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
artistCompare(list(fooFightersReg, jColeReg))
artistCompare(list(maroon5Reg, taylorSwiftReg))
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).